Phoneme Alignment using Large Margin Techniques

نویسندگان

  • Joseph Keshet
  • Shai Shalev-Shwartz
  • Yoram Singer
چکیده

We propose an alignment method which is based on recent advances in kernel machines and large margin classifiers for sequences [13, 12], which in turn build on the pioneering work of Vapnik and colleagues [15, 4]. The alignment function we devise is based on mapping the speech signal and its phoneme representation along with the target alignment into an abstract vector-space. Building on techniques used for learning SVMs, our alignment function distills to a classifier in this vector-space which is aimed at separating correct alignments from incorrect ones. We describe a simple iterative algorithm for learning the alignment function and discuss its formal properties. Experiments with the TIMIT corpus show that our method outperforms the best performing HMM-based approach [1].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phoneme alignment based on discriminative learning

We propose a new paradigm for aligning a phoneme sequence of a speech utterance with its acoustical signal counterpart. In contrast to common HMM-based approaches, our method employs a discriminative learning procedure in which the learning phase is tightly coupled with the alignment task at hand. The alignment function we devise is based on mapping the input acousticsymbolic representations of...

متن کامل

Precision of phoneme boundaries derived using hidden Markov models

Some phoneme boundaries correspond to abrupt changes in the acoustic signal. Others are less clear-cut because the transition from one phoneme to the next is gradual. This paper compares the phoneme boundaries identified by a large number of different alignment systems, using different signal representations and Hidden Markov Model structures. The variability of the different boundaries is anal...

متن کامل

Evaluating the Pronunciation Component of Text-to-Speech Systems for English: A Performance Comparison

The automatic derivation of word pronunciations from input text is a central task for any text-to-speech system. For general English text at least, this is often thought to be a solved problem, with manually-derived linguistic rules assumed capable of handling ‘novel’ words missing from the system dictionary. Data-driven methods, based on machine learning of the regularities implicit in a large...

متن کامل

An Online Algorithm for Hierarchical Phoneme Classification

Abstract. We present an algorithmic framework for phoneme classification where the set of phonemes is organized in a predefined hierarchical structure. This structure is encoded via a rooted tree which induces a metric over the set of phonemes. Our approach combines techniques from large margin kernel methods and Bayesian analysis. Extending the notion of large margin to hierarchical classifica...

متن کامل

A Novel Approach to Unsupervised Grapheme–to–phoneme Conversion

Automatic, data-driven grapheme-to-phoneme conversion is a challenging but often necessary task. The top-down strategy implicitly adopted by traditional inductive learning techniques tends to dismiss relevant contexts when they have been seen too infrequently in the training data. This paper proposes instead a bottom-up approach which, by design, exhibits better generalization properties. For e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005